Dataset statistics
| Number of variables | 11 |
|---|---|
| Number of observations | 3995 |
| Missing cells | 2569 |
| Missing cells (%) | 5.8% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 343.4 KiB |
| Average record size in memory | 88.0 B |
Variable types
| Categorical | 10 |
|---|---|
| Numeric | 1 |
newsdesk has a high cardinality: 55 distinct values | High cardinality |
subsection has a high cardinality: 53 distinct values | High cardinality |
headline has a high cardinality: 3962 distinct values | High cardinality |
abstract has a high cardinality: 3949 distinct values | High cardinality |
keywords has a high cardinality: 3701 distinct values | High cardinality |
pub_date has a high cardinality: 3663 distinct values | High cardinality |
uniqueID has a high cardinality: 3995 distinct values | High cardinality |
is_popular is highly correlated with section and 2 other fields | High correlation |
section is highly correlated with is_popular and 2 other fields | High correlation |
newsdesk is highly correlated with is_popular and 3 other fields | High correlation |
material is highly correlated with newsdesk and 1 other fields | High correlation |
subsection is highly correlated with is_popular and 3 other fields | High correlation |
newsdesk is highly correlated with section and 3 other fields | High correlation |
section is highly correlated with newsdesk and 3 other fields | High correlation |
subsection is highly correlated with newsdesk and 4 other fields | High correlation |
material is highly correlated with newsdesk and 2 other fields | High correlation |
word_count is highly correlated with subsection | High correlation |
is_popular is highly correlated with newsdesk and 2 other fields | High correlation |
subsection has 2569 (64.3%) missing values | Missing |
headline is uniformly distributed | Uniform |
abstract is uniformly distributed | Uniform |
pub_date is uniformly distributed | Uniform |
uniqueID is uniformly distributed | Uniform |
uniqueID has unique values | Unique |
word_count has 121 (3.0%) zeros | Zeros |
Reproduction
| Analysis started | 2021-11-23 03:27:02.972648 |
|---|---|
| Analysis finished | 2021-11-23 03:27:10.437347 |
| Duration | 7.46 seconds |
| Software version | pandas-profiling v3.1.0 |
| Download configuration | config.json |
| Distinct | 55 |
|---|---|
| Distinct (%) | 1.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 31.3 KiB |
| OpEd | |
|---|---|
| Culture | |
| Washington | 231 |
| Foreign | 207 |
| Science | 205 |
| Other values (50) |
Length
| Max length | 15 |
|---|---|
| Median length | 7 |
| Mean length | 7.063078849 |
| Min length | 4 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 8 ? |
|---|---|
| Unique (%) | 0.2% |
Sample
| 1st row | OpEd |
|---|---|
| 2nd row | OpEd |
| 3rd row | OpEd |
| 4th row | Games |
| 5th row | Sports |
Common Values
| Value | Count | Frequency (%) |
| OpEd | 440 | 11.0% |
| Culture | 277 | 6.9% |
| Washington | 231 | 5.8% |
| Foreign | 207 | 5.2% |
| Science | 205 | 5.1% |
| Business | 201 | 5.0% |
| Learning | 189 | 4.7% |
| Metro | 180 | 4.5% |
| Politics | 179 | 4.5% |
| Sports | 165 | 4.1% |
| Other values (45) | 1721 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| oped | 440 | 11.0% |
| culture | 277 | 6.9% |
| washington | 231 | 5.8% |
| foreign | 207 | 5.2% |
| science | 205 | 5.1% |
| business | 204 | 5.1% |
| learning | 189 | 4.7% |
| metro | 180 | 4.5% |
| politics | 179 | 4.5% |
| sports | 165 | 4.1% |
| Other values (46) | 1739 |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 36 |
|---|---|
| Distinct (%) | 0.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 31.3 KiB |
| U.S. | |
|---|---|
| Opinion | |
| Arts | |
| New York | |
| World | |
| Other values (31) |
Length
| Max length | 20 |
|---|---|
| Median length | 7 |
| Mean length | 7.579474343 |
| Min length | 4 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 5 ? |
|---|---|
| Unique (%) | 0.1% |
Sample
| 1st row | Opinion |
|---|---|
| 2nd row | Opinion |
| 3rd row | Opinion |
| 4th row | Crosswords & Games |
| 5th row | Sports |
Common Values
| Value | Count | Frequency (%) |
| U.S. | 604 | |
| Opinion | 494 | 12.4% |
| Arts | 273 | 6.8% |
| New York | 229 | 5.7% |
| World | 226 | 5.7% |
| The Learning Network | 198 | 5.0% |
| Business Day | 196 | 4.9% |
| Sports | 170 | 4.3% |
| Real Estate | 162 | 4.1% |
| Well | 148 | 3.7% |
| Other values (26) | 1295 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| u.s | 604 | 11.2% |
| opinion | 494 | 9.1% |
| the | 289 | 5.4% |
| arts | 273 | 5.1% |
| new | 229 | 4.2% |
| york | 229 | 4.2% |
| world | 226 | 4.2% |
| learning | 198 | 3.7% |
| network | 198 | 3.7% |
| business | 196 | 3.6% |
| Other values (37) | 2464 |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 53 |
|---|---|
| Distinct (%) | 3.7% |
| Missing | 2569 |
| Missing (%) | 64.3% |
| Memory size | 31.3 KiB |
| Politics | |
|---|---|
| Television | |
| Europe | |
| The Daily | |
| Music | 59 |
| Other values (48) |
Length
| Max length | 22 |
|---|---|
| Median length | 8 |
| Mean length | 8.683730715 |
| Min length | 3 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 11 ? |
|---|---|
| Unique (%) | 0.8% |
Sample
| 1st row | Pro Football |
|---|---|
| 2nd row | Politics |
| 3rd row | Television |
| 4th row | Mind |
| 5th row | Wine, Beer & Cocktails |
Common Values
| Value | Count | Frequency (%) |
| Politics | 373 | 9.3% |
| Television | 110 | 2.8% |
| Europe | 79 | 2.0% |
| The Daily | 78 | 2.0% |
| Music | 59 | 1.5% |
| Sunday Review | 58 | 1.5% |
| Family | 58 | 1.5% |
| Asia Pacific | 57 | 1.4% |
| Art & Design | 48 | 1.2% |
| Pro Football | 47 | 1.2% |
| Other values (43) | 459 | 11.5% |
| (Missing) | 2569 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| politics | 374 | |
| television | 110 | 5.6% |
| review | 93 | 4.8% |
| europe | 79 | 4.1% |
| 78 | 4.0% | |
| the | 78 | 4.0% |
| daily | 78 | 4.0% |
| pro | 64 | 3.3% |
| music | 59 | 3.0% |
| family | 58 | 3.0% |
| Other values (60) | 878 |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 9 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 31.3 KiB |
| News | |
|---|---|
| Op-Ed | |
| Interactive Feature | 121 |
| Review | 115 |
| briefing | 60 |
| Other values (4) | 89 |
Length
| Max length | 19 |
|---|---|
| Median length | 4 |
| Mean length | 4.877596996 |
| Min length | 4 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Op-Ed |
|---|---|
| 2nd row | Op-Ed |
| 3rd row | Op-Ed |
| 4th row | News |
| 5th row | News |
Common Values
| Value | Count | Frequency (%) |
| News | 3154 | |
| Op-Ed | 456 | 11.4% |
| Interactive Feature | 121 | 3.0% |
| Review | 115 | 2.9% |
| briefing | 60 | 1.5% |
| Obituary (Obit) | 47 | 1.2% |
| Editorial | 29 | 0.7% |
| News Analysis | 11 | 0.3% |
| Letter | 2 | 0.1% |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| news | 3165 | |
| op-ed | 456 | 10.9% |
| interactive | 121 | 2.9% |
| feature | 121 | 2.9% |
| review | 115 | 2.8% |
| briefing | 60 | 1.4% |
| obituary | 47 | 1.1% |
| obit | 47 | 1.1% |
| editorial | 29 | 0.7% |
| analysis | 11 | 0.3% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 3962 |
|---|---|
| Distinct (%) | 99.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 31.3 KiB |
| Variety: Acrostic | 6 |
|---|---|
| Homes for Sale in New York and New Jersey | 6 |
| Homes for Sale in New York and Connecticut | 6 |
| What the Heck Is That? | 4 |
| Homes for Sale in Brooklyn, Queens and Manhattan | 4 |
| Other values (3957) |
Length
| Max length | 123 |
|---|---|
| Median length | 56 |
| Mean length | 53.25957447 |
| Min length | 3 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 3947 ? |
|---|---|
| Unique (%) | 98.8% |
Sample
| 1st row | Anyone Else Want to See Trump ‘Shut Up’? |
|---|---|
| 2nd row | Trump Calls on Extremists to ‘Stand By’ |
| 3rd row | Can Mike Espy Make History, Again? |
| 4th row | In Which Rikishi Wear Mawashi |
| 5th row | N.F.L. Week 4 Predictions: Our Picks Against the Spread |
Common Values
| Value | Count | Frequency (%) |
| Variety: Acrostic | 6 | 0.2% |
| Homes for Sale in New York and New Jersey | 6 | 0.2% |
| Homes for Sale in New York and Connecticut | 6 | 0.2% |
| What the Heck Is That? | 4 | 0.1% |
| Homes for Sale in Brooklyn, Queens and Manhattan | 4 | 0.1% |
| The Crossword Stumper | 3 | 0.1% |
| Homes for Sale in Brooklyn, Manhattan and Queens | 3 | 0.1% |
| $1.6 Million Homes in California | 2 | 0.1% |
| Homes for Sale in Brooklyn, Manhattan and the Bronx | 2 | 0.1% |
| Homes for Sale in Brooklyn, Manhattan and Staten Island | 2 | 0.1% |
| Other values (3952) | 3957 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| the | 1580 | 4.4% |
| a | 1018 | 2.8% |
| to | 836 | 2.3% |
| in | 774 | 2.2% |
| of | 718 | 2.0% |
| and | 587 | 1.6% |
| for | 483 | 1.3% |
| is | 383 | 1.1% |
| trump | 297 | 0.8% |
| how | 284 | 0.8% |
| Other values (8309) | 28954 |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 3949 |
|---|---|
| Distinct (%) | 98.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 31.3 KiB |
| What is this image saying? | 11 |
|---|---|
| Look closely at this image, stripped of its caption, and join the moderated conversation about what you and other students see. | 10 |
| Teenage comments in response to our recent writing prompts, and an invitation to join the ongoing conversation. | 9 |
| What story does this image inspire for you? | 8 |
| Our critics and writers have selected noteworthy cultural events to experience virtually or in person in New York City. | 4 |
| Other values (3944) |
Length
| Max length | 626 |
|---|---|
| Median length | 132 |
| Mean length | 129.5048811 |
| Min length | 18 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 3938 ? |
|---|---|
| Unique (%) | 98.6% |
Sample
| 1st row | Our president as a terrible toddler. |
|---|---|
| 2nd row | Instead of condemning violent groups, the president marshals them. |
| 3rd row | If the Democratic Party claims to value Black support, then they should work harder to make it happen. |
| 4th row | Adam Fromm is on the line. |
| 5th row | Tom Brady and the Buccaneers are building momentum and the Bears hope to continue an improbable start. Two games — Chiefs-Patriots and Titans-Steelers — have been postponed. |
Common Values
| Value | Count | Frequency (%) |
| What is this image saying? | 11 | 0.3% |
| Look closely at this image, stripped of its caption, and join the moderated conversation about what you and other students see. | 10 | 0.3% |
| Teenage comments in response to our recent writing prompts, and an invitation to join the ongoing conversation. | 9 | 0.2% |
| What story does this image inspire for you? | 8 | 0.2% |
| Our critics and writers have selected noteworthy cultural events to experience virtually or in person in New York City. | 4 | 0.1% |
| A look at one of the entries that fooled solvers in last week’s puzzles. | 3 | 0.1% |
| A look at one of the entries from last week’s puzzles that stumped our solvers. | 3 | 0.1% |
| Recent residential sales in New York City and the region. | 3 | 0.1% |
| Trees appear to communicate and cooperate through subterranean networks of fungi. What are they sharing with one another? | 2 | 0.1% |
| We chart the trials of a tavern in Oakland, Calif., that was thriving until the pandemic brought economic and emotional turmoil. | 2 | 0.1% |
| Other values (3939) | 3940 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| the | 5037 | 5.9% |
| a | 2517 | 3.0% |
| of | 2287 | 2.7% |
| to | 2259 | 2.7% |
| and | 2193 | 2.6% |
| in | 1869 | 2.2% |
| for | 850 | 1.0% |
| is | 740 | 0.9% |
| that | 654 | 0.8% |
| are | 638 | 0.7% |
| Other values (13625) | 66138 |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 3701 |
|---|---|
| Distinct (%) | 92.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 31.3 KiB |
| [] | 205 |
|---|---|
| ['Crossword Puzzles'] | 53 |
| ['New York City'] | 14 |
| ['Television', 'Fargo (TV Program)'] | 8 |
| ['Customs, Etiquette and Manners', 'Content Type: Service'] | 4 |
| Other values (3696) |
Length
| Max length | 1381 |
|---|---|
| Median length | 166 |
| Mean length | 176.1924906 |
| Min length | 2 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 3684 ? |
|---|---|
| Unique (%) | 92.2% |
Sample
| 1st row | ['Presidential Election of 2020', 'Biden, Joseph R Jr', 'Trump, Donald J', 'Debates (Political)'] |
|---|---|
| 2nd row | ['Presidential Election of 2020', 'United States Politics and Government', 'Right-Wing Extremism and Alt-Right', 'Fringe Groups and Movements', 'Whites', 'Debates (Political)', 'Demonstrations, Protests and Riots', 'Trump, Donald J', 'United States'] |
| 3rd row | ['Black People', 'Blacks', 'Presidential Election of 2020', 'United States Politics and Government', 'State Legislatures', 'Elections, Senate', 'Democratic Party', 'Republican Party', 'Senate', 'Espy, Mike', 'Mississippi'] |
| 4th row | ['Crossword Puzzles'] |
| 5th row | ['Football', 'New England Patriots', 'Kansas City Chiefs', 'Los Angeles Chargers', 'Tampa Bay Buccaneers', 'Indianapolis Colts', 'Chicago Bears', 'Buffalo Bills', 'Las Vegas Raiders', 'Tennessee Titans', 'Pittsburgh Steelers', 'Mahomes, Patrick (1995- )', 'Baltimore Ravens'] |
Common Values
| Value | Count | Frequency (%) |
| [] | 205 | 5.1% |
| ['Crossword Puzzles'] | 53 | 1.3% |
| ['New York City'] | 14 | 0.4% |
| ['Television', 'Fargo (TV Program)'] | 8 | 0.2% |
| ['Customs, Etiquette and Manners', 'Content Type: Service'] | 4 | 0.1% |
| ['Football', 'National Football League'] | 3 | 0.1% |
| ['Television', 'The Mandalorian (TV Program)'] | 3 | 0.1% |
| ['Customs, Etiquette and Manners'] | 3 | 0.1% |
| ['Presidential Election of 2020', 'United States Politics and Government', 'Biden, Joseph R Jr', 'Trump, Donald J'] | 2 | 0.1% |
| ['internal-essential'] | 2 | 0.1% |
| Other values (3691) | 3698 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| and | 6504 | 7.9% |
| of | 1357 | 1.6% |
| states | 1342 | 1.6% |
| united | 1302 | 1.6% |
| coronavirus | 1261 | 1.5% |
| 2019-ncov | 977 | 1.2% |
| politics | 971 | 1.2% |
| government | 952 | 1.2% |
| 2020 | 943 | 1.1% |
| election | 889 | 1.1% |
| Other values (8747) | 66055 |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 1691 |
|---|---|
| Distinct (%) | 42.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1247.067084 |
| Minimum | 0 |
|---|---|
| Maximum | 15619 |
| Zeros | 121 |
| Zeros (%) | 3.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 31.3 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 214 |
| Q1 | 876.5 |
| median | 1188 |
| Q3 | 1481 |
| 95-th percentile | 2355.5 |
| Maximum | 15619 |
| Range | 15619 |
| Interquartile range (IQR) | 604.5 |
Descriptive statistics
| Standard deviation | 815.9134971 |
|---|---|
| Coefficient of variation (CV) | 0.6542659234 |
| Kurtosis | 42.38546579 |
| Mean | 1247.067084 |
| Median Absolute Deviation (MAD) | 305 |
| Skewness | 4.260082579 |
| Sum | 4982033 |
| Variance | 665714.8348 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 121 | 3.0% |
| 893 | 12 | 0.3% |
| 1255 | 10 | 0.3% |
| 1277 | 10 | 0.3% |
| 1190 | 9 | 0.2% |
| 1252 | 9 | 0.2% |
| 1136 | 9 | 0.2% |
| 909 | 9 | 0.2% |
| 896 | 8 | 0.2% |
| 1337 | 8 | 0.2% |
| Other values (1681) | 3790 |
| Value | Count | Frequency (%) |
| 0 | 121 | |
| 16 | 1 | < 0.1% |
| 103 | 2 | 0.1% |
| 114 | 1 | < 0.1% |
| 116 | 1 | < 0.1% |
| 124 | 1 | < 0.1% |
| 126 | 3 | 0.1% |
| 128 | 1 | < 0.1% |
| 130 | 2 | 0.1% |
| 131 | 2 | 0.1% |
| Value | Count | Frequency (%) |
| 15619 | 1 | |
| 10496 | 1 | |
| 8423 | 1 | |
| 8384 | 1 | |
| 8223 | 1 | |
| 7815 | 1 | |
| 7759 | 1 | |
| 7677 | 1 | |
| 7550 | 1 | |
| 7518 | 1 |
| Distinct | 3663 |
|---|---|
| Distinct (%) | 91.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 31.3 KiB |
| 2020-10-14 09:00:29+00:00 | 5 |
|---|---|
| 2020-11-13 10:00:21+00:00 | 5 |
| 2020-11-20 10:00:25+00:00 | 4 |
| 2020-12-23 10:00:32+00:00 | 4 |
| 2020-11-16 10:00:08+00:00 | 4 |
| Other values (3658) |
Length
| Max length | 25 |
|---|---|
| Median length | 25 |
| Mean length | 25 |
| Min length | 25 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 3407 ? |
|---|---|
| Unique (%) | 85.3% |
Sample
| 1st row | 2020-10-01 00:05:51+00:00 |
|---|---|
| 2nd row | 2020-10-01 00:43:28+00:00 |
| 3rd row | 2020-10-01 00:45:17+00:00 |
| 4th row | 2020-10-01 02:00:05+00:00 |
| 5th row | 2020-10-01 04:01:16+00:00 |
Common Values
| Value | Count | Frequency (%) |
| 2020-10-14 09:00:29+00:00 | 5 | 0.1% |
| 2020-11-13 10:00:21+00:00 | 5 | 0.1% |
| 2020-11-20 10:00:25+00:00 | 4 | 0.1% |
| 2020-12-23 10:00:32+00:00 | 4 | 0.1% |
| 2020-11-16 10:00:08+00:00 | 4 | 0.1% |
| 2020-11-12 10:00:29+00:00 | 4 | 0.1% |
| 2020-11-24 10:00:21+00:00 | 4 | 0.1% |
| 2020-11-04 10:00:18+00:00 | 4 | 0.1% |
| 2020-12-11 10:00:25+00:00 | 4 | 0.1% |
| 2020-12-23 10:00:11+00:00 | 4 | 0.1% |
| Other values (3653) | 3953 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| 2020-10-28 | 72 | 0.9% |
| 2020-12-23 | 72 | 0.9% |
| 2020-12-15 | 70 | 0.9% |
| 2020-11-02 | 70 | 0.9% |
| 2020-12-09 | 68 | 0.9% |
| 2020-12-02 | 66 | 0.8% |
| 2020-10-13 | 66 | 0.8% |
| 2020-10-30 | 65 | 0.8% |
| 2020-11-10 | 65 | 0.8% |
| 2020-11-23 | 64 | 0.8% |
| Other values (2380) | 7312 |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 31.3 KiB |
| 0 | |
|---|---|
| 1 |
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 1 |
| 5th row | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 2064 | |
| 1 | 1931 |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| 0 | 2064 | |
| 1 | 1931 |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 3995 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 31.3 KiB |
| nyt://article/e467c2ae-2df3-5836-a6ca-b23d0d335e4f | 1 |
|---|---|
| nyt://article/ef3d1eb3-6297-5711-bb2a-27bee6fc6831 | 1 |
| nyt://article/ca3eaa6a-be4d-5a0d-b368-070772436120 | 1 |
| nyt://article/fe6bcb27-806e-5b82-bf26-fd954254c1a5 | 1 |
| nyt://article/bef36e56-8080-5278-80a9-8927bee7c0b3 | 1 |
| Other values (3990) |
Length
| Max length | 54 |
|---|---|
| Median length | 50 |
| Mean length | 50.12115144 |
| Min length | 50 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 3995 ? |
|---|---|
| Unique (%) | 100.0% |
Sample
| 1st row | nyt://article/e467c2ae-2df3-5836-a6ca-b23d0d335e4f |
|---|---|
| 2nd row | nyt://article/9a7ef9e0-1334-56b2-a7f1-288c48873b06 |
| 3rd row | nyt://article/4bb2b763-0088-5e10-b204-19e404f744ec |
| 4th row | nyt://article/0d96205f-edb8-5f1f-8c44-1ddf6ed56d1a |
| 5th row | nyt://article/afc8295b-3c22-5a5f-9539-3f77b7b8eeeb |
Common Values
| Value | Count | Frequency (%) |
| nyt://article/e467c2ae-2df3-5836-a6ca-b23d0d335e4f | 1 | < 0.1% |
| nyt://article/ef3d1eb3-6297-5711-bb2a-27bee6fc6831 | 1 | < 0.1% |
| nyt://article/ca3eaa6a-be4d-5a0d-b368-070772436120 | 1 | < 0.1% |
| nyt://article/fe6bcb27-806e-5b82-bf26-fd954254c1a5 | 1 | < 0.1% |
| nyt://article/bef36e56-8080-5278-80a9-8927bee7c0b3 | 1 | < 0.1% |
| nyt://article/cf01199e-51f2-5a0d-b32a-6ec7ade4ff29 | 1 | < 0.1% |
| nyt://article/59d2cf49-531e-5263-a738-1c4488c0fb84 | 1 | < 0.1% |
| nyt://article/a0339f93-7ae8-5840-a825-642ab1f2ba02 | 1 | < 0.1% |
| nyt://article/9caf41fa-de3b-5bfa-8df9-290b41f1ad87 | 1 | < 0.1% |
| nyt://article/edfb7f02-ac48-5b65-ad7f-062e6cd36189 | 1 | < 0.1% |
| Other values (3985) | 3985 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| nyt://article/e467c2ae-2df3-5836-a6ca-b23d0d335e4f | 1 | < 0.1% |
| nyt://article/f4c9673b-8374-518b-b0a1-25cfae0921d0 | 1 | < 0.1% |
| nyt://article/c3295d25-e296-571a-a3dd-c7f56f317c32 | 1 | < 0.1% |
| nyt://article/138f58dc-301a-586a-bb94-8c010d0e789f | 1 | < 0.1% |
| nyt://article/4bb2b763-0088-5e10-b204-19e404f744ec | 1 | < 0.1% |
| nyt://article/0d96205f-edb8-5f1f-8c44-1ddf6ed56d1a | 1 | < 0.1% |
| nyt://article/afc8295b-3c22-5a5f-9539-3f77b7b8eeeb | 1 | < 0.1% |
| nyt://article/27e40157-1790-59fc-8153-11cc88950152 | 1 | < 0.1% |
| nyt://article/db8a2622-8509-5c2a-a8fe-6cb1ec8d0989 | 1 | < 0.1% |
| nyt://article/c1695e32-4822-51aa-958c-52d9ebacabcd | 1 | < 0.1% |
| Other values (3985) | 3985 |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.
First rows
| newsdesk | section | subsection | material | headline | abstract | keywords | word_count | pub_date | is_popular | uniqueID | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | OpEd | Opinion | NaN | Op-Ed | Anyone Else Want to See Trump ‘Shut Up’? | Our president as a terrible toddler. | ['Presidential Election of 2020', 'Biden, Joseph R Jr', 'Trump, Donald J', 'Debates (Political)'] | 925 | 2020-10-01 00:05:51+00:00 | 1 | nyt://article/e467c2ae-2df3-5836-a6ca-b23d0d335e4f |
| 1 | OpEd | Opinion | NaN | Op-Ed | Trump Calls on Extremists to ‘Stand By’ | Instead of condemning violent groups, the president marshals them. | ['Presidential Election of 2020', 'United States Politics and Government', 'Right-Wing Extremism and Alt-Right', 'Fringe Groups and Movements', 'Whites', 'Debates (Political)', 'Demonstrations, Protests and Riots', 'Trump, Donald J', 'United States'] | 902 | 2020-10-01 00:43:28+00:00 | 1 | nyt://article/9a7ef9e0-1334-56b2-a7f1-288c48873b06 |
| 2 | OpEd | Opinion | NaN | Op-Ed | Can Mike Espy Make History, Again? | If the Democratic Party claims to value Black support, then they should work harder to make it happen. | ['Black People', 'Blacks', 'Presidential Election of 2020', 'United States Politics and Government', 'State Legislatures', 'Elections, Senate', 'Democratic Party', 'Republican Party', 'Senate', 'Espy, Mike', 'Mississippi'] | 1412 | 2020-10-01 00:45:17+00:00 | 1 | nyt://article/4bb2b763-0088-5e10-b204-19e404f744ec |
| 3 | Games | Crosswords & Games | NaN | News | In Which Rikishi Wear Mawashi | Adam Fromm is on the line. | ['Crossword Puzzles'] | 849 | 2020-10-01 02:00:05+00:00 | 1 | nyt://article/0d96205f-edb8-5f1f-8c44-1ddf6ed56d1a |
| 4 | Sports | Sports | Pro Football | News | N.F.L. Week 4 Predictions: Our Picks Against the Spread | Tom Brady and the Buccaneers are building momentum and the Bears hope to continue an improbable start. Two games — Chiefs-Patriots and Titans-Steelers — have been postponed. | ['Football', 'New England Patriots', 'Kansas City Chiefs', 'Los Angeles Chargers', 'Tampa Bay Buccaneers', 'Indianapolis Colts', 'Chicago Bears', 'Buffalo Bills', 'Las Vegas Raiders', 'Tennessee Titans', 'Pittsburgh Steelers', 'Mahomes, Patrick (1995- )', 'Baltimore Ravens'] | 2690 | 2020-10-01 04:01:16+00:00 | 0 | nyt://article/afc8295b-3c22-5a5f-9539-3f77b7b8eeeb |
| 5 | Learning | The Learning Network | NaN | News | Confrontation | What story does this image inspire for you? | [] | 161 | 2020-10-01 07:00:02+00:00 | 0 | nyt://article/27e40157-1790-59fc-8153-11cc88950152 |
| 6 | Politics | U.S. | Politics | News | For Voters Still Mulling, One Thing Is Clear: That Debate Didn’t Help | A small but crucial segment of likely voters say they remain uncommitted — to a candidate or to voting at all — and nothing they heard on Tuesday clinched things for them. | ['Presidential Election of 2020', 'Debates (Political)', 'Voting and Voters', 'Democratic Party', 'Republican Party', 'Trump, Donald J', 'Biden, Joseph R Jr'] | 1270 | 2020-10-01 07:00:06+00:00 | 1 | nyt://article/db8a2622-8509-5c2a-a8fe-6cb1ec8d0989 |
| 7 | Culture | Arts | Television | News | After ‘The Salisbury Poisonings,’ Locals Picked Up the Pieces | A new AMC show dramatizes the 2018 poisoning of a former Russian spy in Britain. Even for a reporter who covered the real events, the four episodes contain revelations. | ['Television', 'The Salisbury Poisonings (TV Program)', 'Lawn, Declan', 'Patterson, Adam (Filmmaker)', 'AMC (TV Network)', 'Skripal, Sergei V', 'Poisoning and Poisons', 'Assassinations and Attempted Assassinations', 'Espionage and Intelligence Services', 'News and News Media', 'Russia', 'Great Britain', 'Sturgess, Dawn', 'Salisbury (England)'] | 1165 | 2020-10-01 08:13:43+00:00 | 0 | nyt://article/c1695e32-4822-51aa-958c-52d9ebacabcd |
| 8 | Learning | The Learning Network | NaN | News | Are You Having a Tough Time Maintaining Friendships These Days? | Has the pandemic brought you closer together with friends? Or moved you farther apart? | [] | 918 | 2020-10-01 09:00:03+00:00 | 1 | nyt://article/18dcd4cf-e1c8-5741-af33-38d1198e44e1 |
| 9 | Magazine | Magazine | NaN | News | Distance Learning, With Shades of Big Brother | A video on digital classroom etiquette makes it very clear: Your home is no longer your own, and your kids must pretend to learn in it. | ['E-Learning', 'Children and Childhood', 'Education (K-12)', 'Quarantine (Life and Culture)', 'Customs, Etiquette and Manners'] | 1238 | 2020-10-01 09:00:04+00:00 | 1 | nyt://article/c617ba64-da38-5ee3-95d4-aa12d157d741 |
Last rows
| newsdesk | section | subsection | material | headline | abstract | keywords | word_count | pub_date | is_popular | uniqueID | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 3985 | RealEstate | Real Estate | NaN | News | Homes for Sale in New York and Connecticut | This week’s properties include a five-bedroom in Great Neck, N.Y., and a three-bedroom in Fairfield, Conn. | ['Real Estate and Housing (Residential)', 'Great Neck (NY)', 'Fairfield (Conn)'] | 124 | 2020-12-31 14:00:20+00:00 | 0 | nyt://article/b0e4fe64-8bee-5a38-81c8-c5ecc595ec9a |
| 3986 | RealEstate | Real Estate | NaN | News | Homes for Sale in Brooklyn, Manhattan and Staten Island | This week’s properties are in Downtown Brooklyn, the Flatiron district and Grymes Hill. | ['Real Estate and Housing (Residential)', 'Downtown Brooklyn (Brooklyn, NY)', 'Flatiron District (Manhattan, NY)', 'Grymes Hill (Staten Island, NY)'] | 130 | 2020-12-31 14:00:24+00:00 | 0 | nyt://article/6798b89f-8926-5e39-9d72-aa5f03eb02aa |
| 3987 | Sports | Sports | Pro Basketball | News | Becky Hammon Becomes First Woman to Serve as Head Coach in N.B.A. Game | She took over coaching the San Antonio Spurs after Gregg Popovich was ejected from a game against the Los Angeles Lakers on Wednesday night. | ['Basketball', 'National Basketball Assn', 'Hammon, Becky', 'San Antonio Spurs', 'Los Angeles Lakers'] | 597 | 2020-12-31 14:30:03+00:00 | 0 | nyt://article/2df71ddc-ac42-54a7-9af2-84ceeba85960 |
| 3988 | Arts&Leisure | Arts | Art & Design | News | Superheroes and Trailblazers: Black Comic Book Artists, Rediscovered | A new book examines the lives of these trailblazers, who paved the way for subsequent generations of illustrators but were invisible to the mainstream in their own time. | ['Art', 'Comic Books and Strips', 'Black People', 'Blacks', 'Quattro, Ken', 'Invisible Men: The Trailblazing Black Artists of Comic Books (Book)', 'Herriman, George (1880-1944)', 'Jackson, Jay Paul', 'Stoner, Elmer C', 'Greene, Sanford', 'Middleton, Owen Charles'] | 1593 | 2020-12-31 15:00:09+00:00 | 0 | nyt://article/ae47f0b2-c2ba-5a89-adb9-2b7f4ce1667c |
| 3989 | Science | Health | NaN | News | Here’s Why Distribution of the Vaccine Is Taking Longer Than Expected | Health officials and hospitals are struggling with a lack of resources. Holiday staffing and saving doses for nursing homes are also contributing to delays. | ['Vaccination and Immunization', 'Coronavirus (2019-nCoV)', 'Public-Private Sector Cooperation', 'States (US)', 'your-feed-healthcare'] | 1570 | 2020-12-31 15:26:56+00:00 | 1 | nyt://article/5320a2e9-d739-542a-a397-443c43231527 |
| 3990 | Editorial | Opinion | NaN | Op-Ed | What It Takes to Heal From Covid-19 | Survivors can get better, but they need help. | ['Chronic Condition (Health)', 'Coronavirus (2019-nCoV)', 'Health Insurance and Managed Care'] | 1002 | 2020-12-31 15:27:47+00:00 | 1 | nyt://article/e8adbb75-a8b3-5a8c-886b-b9c1195f607b |
| 3991 | Sports | Sports | Baseball | News | Padres Jolt M.L.B. With Bold Moves to Set Up World Series Run | While many teams continued to assess the financial consequences of the coronavirus pandemic, San Diego acquired two pricey pitchers and instantly became one of the favorites to win the World Series. | ['San Diego Padres', 'Major League Baseball', 'Free Agents (Sports)', 'Trades (Sports)', 'Darvish, Yu', 'Snell, Blake (1992- )'] | 1100 | 2020-12-31 15:47:44+00:00 | 0 | nyt://article/1f11417d-2c57-51b9-b75d-8f67f0a98ba9 |
| 3992 | Business | Business Day | NaN | News | Their Finances Ravaged, Customers Fear Banks Will Withhold Stimulus Checks | Banks have the power to decide whether to let overdrawn customers gain access to the stimulus money being deposited into their accounts, but they have taken different approaches. | ['Banking and Financial Institutions', 'Coronavirus Aid, Relief, and Economic Security Act (2020)', 'Stimulus (Economic)', 'Prices (Fares, Fees and Rates)', 'Personal Finances'] | 1429 | 2020-12-31 16:21:40+00:00 | 1 | nyt://article/c4b9edab-bdde-5d81-b496-06fedb527c39 |
| 3993 | Dining | Food | Wine, Beer & Cocktails | News | Should Wine Be Among Your Health Resolutions? | The new category of ‘clean wines’ is an effort to appeal to those seeking wellness. But why try to rationalize wine as a healthful product? | ['Wines', 'Grapes', 'Diet and Nutrition', 'Diaz, Cameron', 'Power, Katherine (1980- )', 'Avaline Ltd'] | 1307 | 2020-12-31 17:28:11+00:00 | 1 | nyt://article/efcaf652-ffad-5b4e-9f17-4fd9aff5b1ba |
| 3994 | Business | Technology | NaN | News | Microsoft Says Russian Hackers Viewed Some of Its Source Code | The hackers gained more access than the company previously revealed, though the attackers were unable to modify code or access emails. | ['Microsoft Corp', 'US Federal Government Data Breach (2020)', 'Cyberwarfare and Defense', 'Cyberattacks and Hackers', 'Computer Security', 'SolarWinds'] | 340 | 2020-12-31 18:02:02+00:00 | 1 | nyt://article/12048b2b-62e3-5bed-8c77-483a4299f465 |